PRESCAL Updating Design

Original work: https://github.com/dongwookim-ml/almc
Paper: https://arxiv.org/pdf/1608.05921.pdf

To speed up the algorithm, we re-design the updating steps.

We consider:

We might do not need to update posteriors for all entities and relations.
For a sequential updating algorithm (Thompson sampling), it doesn't make sense to use all observed labels in each iteration. i.e.

$$P_t = l({x}_t) P_{t-1}$$

where $x_t$ is the label observed in $t^{th}$ iteration.

Based on the above consideration, we come up with the following design ideas:

Design 1

Assume we observe $x_{ijk}$ in $t^{th}$ iteration, we only update the posterior of $e_i, e_j, r_k$ using the new label $x_{ijk}$.

$\textbf{Prior}$:

$$P(\mathbf{e_i}|\sigma_e) = \mathcal{N}(\mathbf{e_i}| \mathbf{u_e}, {\sigma_e}^2 I_D)$$$$P(\mathbf{R_k}|\sigma_r) = \mathcal{MN}(\mathbf{R_k}| \mathbf{u_r}, {\sigma_r} I_D, {\sigma_r} I_D)$$

or eqivalently, $$P(\mathbf{r_k}|\sigma_r) = \mathcal{N}(\mathbf{r_k}| \mathbf{u_r}, {\sigma_r}^2 I_{D^2})$$ where $r_k = vec(R_k) \in \mathcal{R}^{D^2,1}$

$\textbf{Likelihood}$:

$$p(x_{ikj}|\mathbf{e_i, e_j}, R_k) = \mathcal{N}(x_{ikj}| \mathbf{e_i}^T R_k \mathbf{e_j}, \sigma_x^2)$$

using the identity $\mathbf{e_i}^T R_k \mathbf{e_j} = r_k^T \mathbf{e_i} \otimes \mathbf{e_j}$, $$p(x_{ikj}|\mathbf{e_i, e_j, r_k}) = \mathcal{N}(x_{ikj}| \mathbf{r_k}^T \mathbf{e_i} \otimes \mathbf{e_j}, \sigma_x^2)$$

$\textbf{Entity Posterior}$:

$$P(\mathbf{e_i}|x_{ikj}, \mathbf{e_j}, R_k, \sigma_e) = \mathcal{N}(\mathbf{e_i}| m_{eN}, s_{eN}) \propto P(\mathbf{e_i}|\sigma_e)P(x_{ikj}|\mathbf{e_i, e_j}, R_k) = \mathcal{N}(\mathbf{e_i}| \mathbf{u_e}, {\sigma_e}^2 I_D) \mathcal{N}(x_{ikj}| \mathbf{e_i}^T R_k \mathbf{e_j}, \sigma_x^2)$$

We know for $c \mathcal{N}(\mathbf{x|c, C}) = \mathcal{N}(\mathbf{x|a, A})\mathcal{N}(\mathbf{x|b, B})$,

\begin{equation} \mathbf{C = {(A^{-1} + B ^{-1)})}^{-1}} \end{equation}$$\mathbf{c = C(A^{-1}a + B^{-1}b)}$$

So the goal is to transform $\mathcal{N}(x_{ikj}| r_k^T \mathbf{e_i} \otimes \mathbf{e_j}, \sigma_x^2)$ into $\mathcal{N}(\mathbf{e_i}| M x_{ikj}, \sigma_x^2 MM^T)$

Assume $R_k \mathbf{e_j}$ is column full rank, $$x_{ikj} = \mathbf{e_i^T}R_k\mathbf{e_j} \Leftrightarrow \mathbf{e_i} = (R_k \mathbf{e_j})^{-T}x_{ikj}$$

$$\mathcal{N}(\mathbf{e_i}| M x_{ikj}, \sigma_x^2 MM^T) = \mathcal{N}(\mathbf{e_i}| (R_k \mathbf{e_j})^{-T}x_{ikj}, \sigma_x^2 ((R_k \mathbf{e_j})(R_k \mathbf{e_j})^T)^{-1})$$

$$s_{eN} = (\sigma_e^{-2} I_D + \sigma_x^{-2} (R_k \mathbf{e_j})(R_k \mathbf{e_j})^T)^{-1}$$

$$m_{eN} = s_{eN} (\sigma_e^{-2} \mathbf{u_e} + \sigma_x^{-2} (R_k \mathbf{e_j}) x_{ikj} )$$

Similarly, assum $\mathbf{e_i}^T R_k$ is column full rank, for $P(e_j|x_{ikj}, \mathbf{e_i}, R_k, \sigma_e)$ we have

$$s_{eN} = (\sigma_e^{-2} I_D + \sigma_x^{-2} (\mathbf{e_i}^T R_k)^T(\mathbf{e_i}^T R_k))^{-1}$$$$m_{eN} = s_{eN} (\sigma_e^{-2} \mathbf{u_e} + \sigma_x^{-2} (\mathbf{e_i}^T R_k)^T x_{ikj} )$$

$\textbf{Relation Posterior}:$

$$P(\mathbf{r_k}|x_{ikj}, \mathbf{e_i, e_j}, \sigma_r) = \mathcal{N}(\mathbf{r_k}|m_{rN}, s_{rN}) \propto P(\mathbf{r_k|\sigma_r})P(x_{ikj}|\mathbf{e_i, e_j, r_k}) = \mathcal{N}(\mathbf{r_k}| \mathbf{u_r}, {\sigma_r}^2 I_{D^2}) \mathcal{N}(x_{ikj}| \mathbf{r_k}^T \mathbf{e_i} \otimes \mathbf{e_j}, \sigma_x^2)$$

Similarly, assume $\mathbf{e_i} \otimes \mathbf{e_j}$ is column full rank,

$$ x_{ikj} = \mathbf{r_k}^T \mathbf{e_i} \otimes \mathbf{e_j} \Leftrightarrow \mathbf{r_k} = (\mathbf{e_i} \otimes \mathbf{e_j}) ^{-T} x_{ikj}$$

$$\mathcal{N}(\mathbf{r_k}|M x_{ikj}, \sigma^2 MM^T) = \mathcal{N}(\mathbf{r_k}| (\mathbf{e_i} \otimes \mathbf{e_j}) ^{-T} x_{ikj}, \sigma_x^{2} ((\mathbf{e_i} \otimes \mathbf{e_j}) (\mathbf{e_i} \otimes \mathbf{e_j}) ^T)^{-1} )$$

$$s_{rN} = (\sigma_r^{-2}I_D + \sigma_x^{-2} (\mathbf{e_i} \otimes \mathbf{e_j}) (\mathbf{e_i} \otimes \mathbf{e_j}) ^T)^{-1}$$

$$m_{rN} = s_{rN}(\sigma_r^{-2} \mathbf{u_r} + \sigma_x^{-2} (\mathbf{e_i} \otimes \mathbf{e_j}))$$

Design 2



In [ ]: